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A morphogenic protein known as Dorsal patterns the embryonic dorsoventral body axis of Drosophila by bind- 
ing to transcriptional enhancers across the genome. Each such enhancer activates a neighboring gene at a 
unique threshold concentration of Dorsal. The presence of Dorsal binding site clusters in these enhancers and 
of similar clusters in other enhancers has motivated models of threshold-encoding in site density. However, 
we found that the precise length of a spacer separating a pair of specialized Dorsal and Twist binding sites 
determines the threshold-response. Despite this result, the functional range determined by this spacer ele- 
ment as well as the role and origin of its surrounding Dorsal site cluster remained completely unknown. Here, 
we experiment with enhancers from diverse Drosophila genomes, including the large uncompacted genomes 
from ananassae and willistoni, and report three major interdependent results. First, we map the functional 
range of the threshold-encoding spacer variable. Second, we show that the majority of sites at the cluster are 
non-functional divergent elements that have been separated beyond the encoding's functional range. Third, 
we verify an evolutionary model involving the frequent replacement of a threshold encoding, whose precision 
is easily outdated by shifting accuracy. The process by which encodings are replaced by newer ones is fa- 
cilitated by the palindromic nature of the Dorsal and Twist binding motifs and by intrinsic repeat-instability in 
the specialized Twist binding site, which critically impacts the length of the spacer linking it to Dorsal. Over 
time, the dynamic process of selective deprecation and replacement of encodings adds to a growing cluster 
of deadened elements, or necro-elements, and strongly biases local sequence composition. Necro-element 
plaques are associated with mature enhancers that are older than 10 My but not with newer lineage-specific 
enhancers that employ identical logic. We conclude that the clustered signature of most enhancers results 
from long histories of selective "maintenance" of precise encodings via facile deprecation and equally facile 
replacement. 



Introduction 

Nothing Gold Can Stay 

Nature's first green is gold, 
Her hardest hue to hold. 
Her early leaf's a flower; 
But only so an hour. 
Then leaf subsides to leaf. 
So Eden sank to grief, 
So dawn goes down to day, 
Nothing gold can stay. 

— Robert Frost, New Hampshire (1923) 

How genetic information is encoded in DNA is a 
central question in biology. In many cases, natural 
selection acts efficaciously on regulatory DNA se- 
quences, which specify the precise conditions under 
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which a gene product is made by a cell [BfTT] . How- 
ever, unlike the precise protein-encoding scheme, 
few general principles have emerged for regulatory 
encoding. The identification of such principles 
would facilitate understanding of genomic regulatory 
DNAs and advance many areas of biological investi- 
gation. 

One general feature of regulatory DNAs, which 
include the transcriptional enhancers, is the use 
of combinatorial codes of transcription factor (TF) 
binding sites [12] . This feature allows an enhancer to 
activate its gene only if it binds a specific combina- 
tion of different TF proteins. A less understood gen- 
eral feature is the clustering of multiple binding sites 
for a single TF operating at an enhancer [T3j. This 
unexplained cluster signature has motivated several 
bioinformatic screens that exploit binding site den- 
sity to identify functional enhancers [HI [15]. Such 
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methods detect both functional enhancers and non- 
functional sequences. Moreover, these methods are 
not yet predictive of the exact responses encoded by 
active enhancers bearing site clusters. 

Concentration-specific threshold responses are a 
property of most regulatory DNAs that function 
through recruitment of DNA-binding factors [16j . 
However, developmental enhancers that read classi- 
cal morphogen concentration gradients [17] are ideal 
subjects in decoding regulatory DNA sequences, and 
their functional features. Different enhancers with 
variably-dense clusters of binding sites for the same 
TF are each responsive to their own unique thresh- 
old concentration. Such DNAs can be studied com- 
paratively to identify the variables that encode the 
concentration threshold setting. In principle, such 
a variable might be encoded in one of several non- 
exclusive categories: i) the formulaic combination 
of adjacent binding sites for TFs acting synergisti- 
cally; ii) the range of sequences that determine the 
affinity or allostery of a DNA-bound TF (functional 
grammars); and iii) the higher-order organizational 
arrangement of binding sites (functional syntaxes). 

Two well-studied systems of morphogen- 
rcsponsive enhancers are those that read the Bi- 
coid and Dorsal morphogen concentration gradients 
that pattern the anterior/posterior (A/P) and dor- 
sal/ventral (D/V) axes of the Drosophila embryo, 
respectively [T51 - I29] . Like most enhancers, these 
DNAs contain clusters of binding sites, which in 
this case correspond to those for Bicoid, Dorsal, 
and their DNA-binding co-factors. This cluster- 
ing has prompted several complex "cluster code" 
models that integrate site number, quality, and den- 
sity parameters to determine the threshold read- 
out [30ff32] . Paradoxically however, the apparent 
phenotypic robustness of this "cluster code" to mu- 
tational divergence has been taken to mean that 
this full parameter set is simultaneously flexible and 
determinative 

To address how concentration-threshold re- 
sponses are encoded in Dorsal target enhancers, we 
asked whether there exist a unique, subset of spe- 
cialized TF binding sites in co-clusters of sites for 
Dorsal, Twist, and Suppressor of Hairless [Su(H)] 
|37j . Specialized binding motifs, as identified across 
equivalent enhancers present in a genome and across 
related lineages, do not manifest the full-range of se- 
quences known to be bound by these factors and 
may signify regulatory sub-functionalizations. With 
this approach, we identified two different specialized 
binding sites for Dorsal, as well as specialized bind- 
ing sites for Twist and Su(H) [37]. Since then, we 



have formally referred to DNA sequences that both 
drive expression in the lateral embryonic ectoderm 
and contain this particular collection of specialized 
binding sites as Neurogenic Ectodermal Enhancers, 
or NEEs OETlEg. 

We found that the NEE at the vnd locus, or 
NEE,^, is conserved in Drosophila and mosquitos. 
As such it was present in the latest common ances- 
tor of dipterans ^240-270 million years ago (Mya) 
PHES], or at least >200 Mya gj. We found that 
conserved "canonical" NEEs occur at the rho, vnd, 
brk, and vn loci across the Drosophila genus [7]. 
As such, the canonical NEEs were acquired prior 
to Drosophila diversification over 40 Mya [7|. We 
also found a more recently evolved member of this 
enhancer class, NEE sog , in the sog locus of the 
melanogaster subgroup, which began diverging ^20 
Mya [7]. Thus, NEE- type regulatory sequences have 
been evolving at various unrelated loci within a pe- 
riod spanning the last ^250 My. 

NEEs function by recruiting both Dorsal, a rel- 
homology domain (RHD)-containing TF, and its 
synergistic bHLH co-activator Twist, whose expres- 
sion mirrors the Dorsal morphogen gradient [T^1I^2T - 
[46] . In addition to having sites for Dorsal and Twist, 
NEEs possess sites for Su(H) and Snail. Su(H) is a 
highly-conserved TF that mediates transcriptional 
responses to Notch/Delta signaling [47Tf50] . while 
Snail is a highly-conserved C2H2 zinc-finger TF that 
represses activation in the mesoderm [511152] , In 
D. melanogaster and closely related species, NEEs 
also have a binding site for Dip-3 (Dorsal interacting 
protein-3) [37] . a Dorsal-binding protein required for 
Dorsal/Twist synergistic activation and D/V pat- 
terning jS"3"H51)] . Besides these specialized binding 
sites, NEEs share distinct organizational features 
pertaining to site placement, spacing, and polar- 
ity [37] . These observations suggest that NEEs form 
a distinct set of sequences that "read-out" the Dor- 
sal morphogen gradient at various thresholds in the 
lateral regions of the embryo through specific protein 
complexes composed of Dorsal, Twist, Snail, Su(H) 
and their co-factors. 

Recently, we determined that the specialized 
NEE-typc binding sites for Dorsal and Twist have 
a unique function in setting the threshold for activa- 
tion [7] . In the NEEs from D. melanogaster, D. pseu- 
doobscura, and D. virilis, we found that: i) the pre- 
cise length of a spacer DNA, which separates these 
well-defined Dorsal and Twist binding sites, encodes 
the concentration threshold setting; ii) natural se- 
lection has acted on the length of this spacer in dif- 
ferent lineages of the Drosophila genus to adjust the 
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threshold; and iii) these selective cis-regulatory ad- 
justments have been performed at all NEEs across 
a given genome, as would be expected if they arc 
all co-evolving to a common change in the trans- 
morphogcn gradient [7]. While this study identified 
a heritable feature that encodes different responses 
to Dorsal, it did not address its full functional range 
nor the function of the many other Dorsal binding 
site variants, which constitute the clusters observed 
at these enhancers. As such, it was not clear whether 
these additional Dorsal motifs were necessary and/or 
sufficient for setting the gradient threshold, partici- 
pating in activation or repression, or any other reg- 
ulatory function. 

Here we test several wild-type and 
experimentally-modified NEEs from five divergent 
species of Drosophila: D. melanogaster, D. ananas- 
sae, D. pseudoobscura, D. wUlistoni, and D. virilis. 
Importantly, D. ananassae and D. wUlistoni repre- 
sent the largest assembled Drosophila genomes and 
are less derived than the smaller, compact genomes 
of the melanogaster subgroup, which may have lost 
important signatures indicative of past evolutionary 
history [57] . Using this broad data set, we narrow 
the many explanations of binding site clustering 
down to a single, unexpected, but ultimately predic- 
tive hypothesis of concentration-threshold encoding, 
and explain several perplexing constraints on the 
specialized sites of NEEs and their relative organi- 
zation. We show that complex enhancer clustering is 
a signature that ages over time through a dynamic 
evolutionary process involving facile selection for 
optimal threshold readouts and equally facile loss 
and/or selective deprecation of former threshold- 
encodings. This process, which we term dynamic 
deprecation, produces several non-functional signa- 
tures that obscure the precise morphogen threshold- 
encoding mechanism that wc functionally map and 
confirm in this study. We conclude that the clustered 
signature observed in most enhancers is produced 
by the dynamic evolutionary maintenance of the 
accuracy of precise threshold-encodings. 

Results 

Canonical NEEs are marked by c/s-spectral 
clusters 

We found that binding site clusters at NEEs are 
characterized by a certain "cis-spectral" signature, 
and refer to such clusters simply as cis-spectra 
(Fig. 1). Binding site constituents of cis-spectra are 
revealed specifically within or immediately around 



the cluster as the motif consensus for a TF is re- 
laxed. Thus, a cis-spectral binding cluster remains 
well-defined with increasing degeneracy of the bind- 
ing motif. For example, if we use a motif spectrum 
of increasingly degenerate binding motifs character- 
istic of Dorsal binding sites, we identify additional 
matching sequences locally within the vicinity of the 
module, thus preserving the definition of the cluster 
(Fig. 1, bottom rows of localized clustering). 

We defined three specialized cis-element motifs 
that are associated with the cis-spectral clusters of 
canonical NEEs across Drosophila: SUH /Da, D/3, 
and E(CA)T (sec Fig. 1). These motif signatures cor- 
respond to specialized versions of more general bind- 
ing motifs for Dorsal, Twist, Snail, and Su(H). Im- 
portantly, the specialized motifs typically describe a 
single site at each cis-spectral cluster. 

Despite the numerous binding site variants in 
Dorsal cis-spectra, there are only two distinct and 
separate, specialized Dorsal binding site motifs at 
each NEE, here called Da and D/3. The special- 
ized Dorsal binding motif Da partially overlaps an 
overly-determined and polarized Su(H) binding site 
SUH (Fig. 1). In D. melanogaster, SUH is polarized 
in the same direction as the fi site, a specialized bind- 
ing site for Dip-3 [37]. Furthermore, while the /j, ele- 
ment appears to be absent in distant Drosophila lin- 
eages, SUH is maintained in a polarized state, even 
after turnover events [7]. 

In contrast, the specialized Dorsal binding motif 
D/3 is located uniquely within ^20 bp of the E(CA)T 
element, the spacing to which encodes the thresh- 
old response. Furthermore, an invariant length- 
asymmetry in this nearly palindromic E(CA)T mo- 
tif consistently points to D/3 although D/3 itself 
is not polarized. Importantly, we have never ob- 
served any Dorsal binding site variant to be more 
tightly linked to the E(CA)T element than the D/3 
element. The E(CA)T element itself is a special- 
ized CA-core E-box (5'-CANNTG) with an additional 
T, i.e. the sequence 5'-CACATGT. This E(CA)T cle- 
ment is partially explained as the supcrimposition 
of binding preferences for Twist and Snail. Activat- 
ing Twist:Daughterlcss bHLH hctcrodimcrs bind the 
YA-corc E-box 5'-CAYATG, or E(YA), while the Snail 
repressor binds the motif 5'-SMMCWTGYBK [SHI55] . 
Thus, we predicted that such a co-functional site 
may originate via selection for the superimposed mo- 
tifs, which corresponds to the sequence 5'-SCACATGY. 
This superimposed Twist/Snail binding motif is al- 
most identical with the observed E(CA)T motif, 5'- 
CACATGT. 

We will refer to the three arranged elements of 
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the polarized E(CA)T site, the spacer, and an un- 
polarized D/3 site as an E-to-D encoding. Using this 
terminology, we will show that functional NEE mod- 
ules need be composed only of one E-to-D encoding, 
supported by a nearby generic Su(H) site. We will 
show that the E-to-D sequence is the sole reposi- 
tory of the threshold encoding variable at each NEE 
module, and that cis-spectral clusters and certain 
specialized sites are byproducts accumulated in ma- 
ture enhancers. Last we will show that an intrinsic 
mutational property of the E(CA)T elements facili- 
tates the rapid selection of new E-to-D encodings. 

Canonical NEEs from D. willistoni genome are 
enriched in c/s-spectra 

To better understand the functional importance 
of multiple variant binding sites for Dorsal and 
its co-factors within canonical NEEs, we analyzed 
the D. willistoni genome, which is the largest as- 
sembled Drosophila genome (224 Mb) [57]. The 
study of large genomes is important because rel- 
atively compact genomes may have lost DNA sig- 
natures indicative of past evolutionary processes. 
The D. willistoni lineage is an early branch of 
the same Sophophora subgenus that includes the 
melanogaster subgroup, and represents ^37 My of 
evolutionary divergence since its common ancestor 
with D. melanogaster, whose genome has been sec- 
ondarily compacted (Fig. 2). 

To identify the canonical NEE set from D. willis- 
toni, it is sufficient to query the genome for all 
800 bp sequences containing the three motifs given 
by SUE I Da, D/3, and E(CA)T , without imposing any 
syntactical constraints, such as linked Dorsal/Twist 
binding sites or polarized SUH elements. Such a 
query identifies only the four canonical NEEs of 
Drosophila, and these all conform to the full syn- 
tactical rule set, despite significant levels of sequence 
divergence. We also verified that these NEE-bearing 
loci are expressed in the neurogenic ectoderm of D. 
willistoni embryos by whole-mount in situ hybridiza- 
tion (Fig. 3 A-D). 

We cloned DNAs encompassing the NEE se- 
quences of D. willistoni and tested them for en- 
hancer activity on a lacZ reporter stably integrated 
into multiple independent lines of D. melanogaster. 
Whole-mount in situ hybridization of these embryos 
with an anti-sense lacZ probe showed that the D. 
willistoni enhancers drive lateral ectodermal expres- 
sion in D. melanogaster embryos (Fig. 3 E-H). These 
results demonstrate that these are functional en- 
hancers present in loci expressed in lateral regions 



of the neuroectoderm in D. willistoni embryos. In 
general, D. willistoni NEEs drive slightly narrower 
expression patterns in D. melanogaster than their 
counterpart D. melanogaster NEE reporters, which 
may indicate that they are tuned to higher threshold 
responses (Fig. 4). 

To determine whether the specialized Dorsal 
binding sites Da and D/3 are embedded in clusters 
of Dorsal binding site variants as they are in other 
lineages, we identified all sites in these sequences 
matching a Dorsal motif spectrum and found ex- 
tremely dense Dorsal cis-spectra in the NEEs of D. 
willistoni (Figs. 5-6). As quantified below, these are 
some of the densest clusters yet seen in NEEs of the 
Drosophila genus. To ascertain whether the special- 
ized Dorsal motifs are maintained as unique copies in 
each NEE from D. willistoni, or whether additional 
Dorsal binding variant sites within each cluster also 
match these specialized motifs as would be expected 
by random neutral drift 5!) .001 , we applied our path- 
finding method to identify and characterize the most 
specialized Dorsal binding motifs within their cis- 
spectral clusters [37] (also see Supplement Part I). 
We find that the Da site occurs once in each NEE 
(Fig. 5). Furthermore, Da continues to overlap the 
Su(H) binding site at this particular specialized Dor- 
sal binding site. This property is unique to Da in 
canonical NEEs across the Drosophila genus. Sim- 
ilarly, the Dp of D. willistoni site occurs only once 
in each NEE (Fig. 6). As expected, D/3 is the clos- 
est variant Dorsal binding site adjacent to E(CA)T 
(Fig. 6). The D/3 consensus motif for the canonical 
NEEs of D. willistoni is nearly identical with the cor- 
responding motif in other previously-characterized 
lineages (Table 1). 

The Dorsal cis-spectral clusters of NEEs from 
D. willistoni are associated with another feature 
that is interesting in light of the reduced genomic 
deletion rates relative to D. melanogaster. the D. 
willistoni NEEs appear to be enriched in CA-satellite 
sequence. Given that the E(CA)T sequence, 5'- 
CACATGT, is composed entirely of CA-dinucleotide re- 
peats, we speculated whether the Dorsal cis-spectra 
of NEEs are overlaid with a similar E(CA)T spec- 
tral cluster. In support of this idea, we found sev- 
eral lengthy CA-satellite tracts across the canonical 
NEE set of D. willistoni (Fig. 7). Almost all of these 
are associated with specific constituents of Dorsal 
cis-spectra. Conversely, almost all constituent sites 
of Dorsal spectra are associated with prominent CA- 
satellite tracts. For example, the cis-spectral clus- 
ter of the NEE. U „ of D. willistoni has an expanded 
CA-satellite tracts associated with divergent D/3 ele- 



Evolutionary origin of clustered enhancers 



5 



merits at ^340-400 bp and again at ^580-630 bp, 
while the D. willistoni NEE r ^ also has an expanded 
CA-satellite tracts coordinated to divergent Dj3 ele- 
ments at ~130-150 bp and again at ~270-290 bp 
(Fig. 7). Last, the NEE„ n( z sequence, which is the 
descendant of the oldest known NEE because it is 
found in mosquitos, is characterized by the greatest 
number of lengthy CA-satellite tracts in D. willistoni 
(Fig. 7). 

Constituents of c/s-spectra represent non- 
functional necro-elements 

In the NEEtmd module of D. willistoni, we detected 
the loss of one of two E-to-D encodings that are 
present and intact in the NEE,^ sequences from 
the D. melanogaster, D. pseudoobscura, and D. vir- 
ilis genomes [3[37]. The first E-to-D encoding has 
a tighter spacer compared to the second, distantly- 
spaced E-to-D encoding. Furthermore, the Dorsal 
binding site at this second encoding is a divergent 
Dp element (Fig. 1). In the D. willistoni lineage, the 
E(CA)T element of this second divergent encoding ex- 
panded on both sides and then split apart (Fig. 8A, 
inverted CA-satellite palindromic pair #4). This 
is unambiguously an inactivating mutation of the 
Twist binding element. Furthermore, the NEE„ n( j of 
D. willistoni is marked by several other such palin- 
dromic tracts (numbered in Fig. 8A), of which the 
intact but also expanded E(CA)T site is the leftmost 
site in a series of increasingly- lengthy, split, inverted 
palindromic CA-satellite repeats (Fig. 8B). These in- 
creasingly expanded CA-satellite palindromes are as- 
sociated with Dorsal binding site variants that are 
increasingly divergent from the D(3 consensus motif 
(Fig. 8C). 

While the D. willistoni NEE,^ sequence has lost 
the second E(CA)T site through repeat expansion 
and separation of the two palindromic moieties, we 
did not know whether this site functioned in species 
in which it is still intact. We therefore tested two 
different fragments contained within a "full-length" 
949 bp enhancer sequence from the vnd locus of D. 
melanogaster (Fig. 9A). We tested a 300 bp frag- 
ment that contains the first E-to-D encoding spaced 
by 10 bp, and a 266 bp fragment that contains the 
second E-to-D encoding spaced by 20 bp. Both frag- 
ments overlap and contain in common the extended 
SUE I Da site (Fig. 9A). 

We found that the 300 bp fragment works just 
as well as the 947 bp fragment (Fig. 9 B, C, and 
E) while the 266 bp fragment hardly works at all 
(Fig. 9D and 4E). Thus, the first E-to-D encoding, 



which is intact and tightly spaced, is sufficient for 
the complete threshold-response, while the second 
E-to-D encoding, which is expansively-spaced to a 
slightly divergent Df3 element, is non-functional. We 
refer to the component sites of the second encoding 
as dead elements, or necro-elements, and label them 
N-E(CA)T and N-D/3. While the N-E(CA)T sequence 
is intact, inspection of this N-D(3 sequence shows 
that it has diverged somewhat from the genus-wide 
Dp consensus (Fig. 9F). 

These results indicated that Dorsal czs-spectra 
and their associated CA-satellite tracts are relic E- 
to-D encodings that were once functional but even- 
tually deprecated and replaced during lineage evo- 
lution. While the evolution of new encodings will 
sometimes occur via selection of spacer length vari- 
ants defined by existing elements, at other times 
it will occur via selection of new replacement sites 
associated with new spacer lengths. Three impor- 
tant features of E-to-D encodings increase the ca- 
pacity for selection of replacement encodings. The 
first feature is the palindromic nature of the E(CA)T 
and D(3 elements, which allows new E-to-D encod- 
ings to arise from the selection of a single emer- 
gent site that is located on the other side of its 
coordinating partner element in an existing encod- 
ing (a leapfrog). The second feature is that the 
E-to-D spacer range is broad-ranged and thus en- 
dows functionality to sub-optimal encodings. The 
third feature is that CA-dinucleotide satellite se- 
quence is susceptible to repeat expansions and con- 
tractions across the Drosophila genus [6"TH6"3"] . We 
assume that the E(CA)T sequence 5'-CACATGT is dy- 
namically unstable in NEEs because this element 
is composed entirely of CA-repeats. In support of 
this, we found that intact E(CA)T elements in the 
NEEs of several Drosophila genomes arc frequently 
repeat-expanded beyond the core heptamer such 
that it matches the general pattern given by 5'- 
(CA), 1 T(GT)„ l , where n > 2 and m > 1 (Table 2). 
This is pronounced particularly in the larger, un- 
compactcd D. ananassae, D. willistoni, and D. vir- 
ilis genomes, (Table 2). These observations are of 
utmost significance: spacer length variants produced 
by an intrinsic repeat instability of the E(CA)T ele- 
ment will drive different threshold-responses. This 
eventuality would also explain the highly invariant 
nature of the E(CA)T sequence. Newly-selected re- 
placement Twist /Snail binding sites will evolve at 
target sequences most closely resembling the dual- 
functioning site predicted by superimposed bind- 
ing preferences (Fig. 10). Initially, such an emer- 
gent site will be associated with a suboptimal 
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spacer. However, random neutral drift to the spe- 
cific E(CA)T sequence would result in the availability 
of spacer length variants via CA-satellite repeat ex- 
pansion/contraction. Thereafter, frequent occasions 
for selection of spacer variants produced by such a 
site would result in the apparent "constraint" of the 
Twist element. 

The evolution of threshold readouts via dynamic 
deprecation and replacement of encodings, as facili- 
tated by instrinsic E(CA)T instability, makes several 
testable predictions. First, a dynamic deprecation 
model is supported if longer CA-satellite tracts in D. 
willistoni NEEs are loosely associated with specific 
components of Dorsal czs-spectra, especially when 
they are spaced beyond the functional range of the 
spacer clement. Second, necro-element accumula- 
tion may progress in a clock-like fashion followed by 
neutral divergence of these sites. Thus cis-element 
spectra for both Dorsal and Twist binding motifs 
should be associated with mature NEEs that are 
canonical to the lineage, but not in newer NEEs 
that might have arisen more recently. Third, we 
should find that threshold readout is correlated to 
spacer length but not to binding site density. Fourth, 
we should be able to remove deprecated encodings 
without affecting the threshold readout (as in Fig. 9 
C and E). Conversely isolated deprecated encodings 
should not possess lower thresholds compared to the 
intact enhancer (as in Fig. 9 D and E). 

Canonical NEEs across Drosophila are enriched 
in necro-element spectra 

To address the generality of CA-satellite accu- 
mulation in NEEs across the genus, we checked 
the percentage of CA-satellite in NEEs from D. 
melanogaster, D. pseudoobscura, D. willistoni, and 
D. virilis relative to their genomic background lev- 
els (Table 3). These analyses consistently show that 
CA-satellite is enriched in NEEs above genomic back- 
ground rates. Importantly, this elevated level is not 
due to the presence of intact E(CA)T motifs, which 
constitute only a minor fraction of CA-repeat se- 
quence in NEEs (Table 3). 

To address the possibility that elevated CA- 
satcllitc composition is a feature common to de- 
velopmental enhancers, we then looked at sev- 
eral canonical enhancers that respond to the Bi- 
coid morphogen gradient, which patterns the ante- 
rior/posterior (A/P) axis. We identified the hunch- 
back (hb) enhancers, the giant (gt) posterior en- 
hancers, the Kruppel (Kr) enhancers, and the well- 
studied even-skipped (eve) stripe 2 enhancers from 



each of 4 genomes: D. melanogaster, D. pseudoob- 
scura, D. willistoni, and D. virilis. All of these en- 
hancers are active in the same embryonic nuclei as 
the NEEs and thus constitute a well-matched con- 
trol group. We found that while all 16 of these A/P 
enhancers possess evolving clusters of Bicoid bind- 
ing site spectra (data not shown), none of them pos- 
sess the elevated CA-satellite levels that characterize 
canonical NEEs from these same species (Fig. 11). 
Thus, there is a tremendous sequence bias that is 
unique to canonical NEEs across the genus and in 
stark contrast to the sequence composition of both 
their genomes and other non-NEE enhancer clusters. 
Furthermore, this NEE compositional bias is related 
to specific functional elements employed by NEEs. 

Having found we could identify the extent of Dor- 
sal czs-spectra with confidence, we then checked its 
potential to encode or influence Dorsal concentra- 
tion threshold read-out of NEEs. For example, we 
checked the relation between threshold-readout and 
the density of Dorsal halfsitcs in a region anchored 
±480 bp from D/3 (Fig. 12A). For this we measured 
the stripe width at 50% egg length as measured by 
the number of nuclei expressing the reporter gene 
from the ventral border of expression up to the dor- 
sal border. We also found no relation between Dorsal 
binding site densities and threshold-encodings after 
trying diverse other descriptors of a Dorsal binding 
site (data not shown). Identical densities of Dorsal 
halfsitcs, degenerate full-sites, and more complete 
full-sites arc present in different enhancers that read- 
out different Dorsal concentration thresholds and 
vice versa. 

In contrast, if we plot the length of the E-to-D 
spacers for NEEs with unambiguous E-to-D encod- 
ings (i.e., encodings with single intact E(CA)T and 
Dp elements) and except those from the dorsally- 
repressed vnd loci, we see a well-defined, hump- 
shaped curve, whose peak activity tops at around 
7 bp and falls on cither side of this maximum. The 
spacer elements from the consistently high-threshold 
NEE,^ sequences across the genus obey a similar, 
albeit depressed, curve because of one additional 
regulatory input (data not shown). Thus, the ele- 
vated CA-satellite content and its associated Dorsal 
cis-spectra are consistent with the central hypothe- 
sis that the sequence composition of these enhancers 
has been shaped by a long history of repeated dep- 
recation and compensatory selection of E-to-D en- 
codings by a process which has been active for more 
than 200 My in the case of the NEE„ n( i sequence, 
and more than 40 My at other canonical NEEs. 

Given the extent of cis-spectral signatures as- 
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sociated with Dorsal and Twist binding elements 
in mature NEEs, we asked whether the special- 
ized Da site, which overlaps an unusually special- 
ized Su(H) binding site, might also be a D/3 nccro- 
element that was conveniently turned into a Su(H) 
site. To address this question, we first compared the 
Da and D(3 consensi motifs across all five divergent 
Drosophila lineages for which we functionally tested 
NEEs in D. melanogaster (Table 1, Fig. 13A). Re- 
markably, we find that the second half of the Da has 
diverged across the genus faster than the first half. 
This second half is the portion that does not overlap 
the Su(H) binding site. Unlike, the slight lineage- 
specific variations of -D/3, Da motif divergence can 
be characterized as increasingly degenerate when de- 
parting from the ancestral Da motif, which is closest 
to a D(3 motif itself. 

To test whether the Su(H) binding site is itself 
functional and perhaps the principal reason for per- 
sistence of the "ghost" Da motif, we specifically mu- 
tated the Su(H)-specific portion of the SUH/Da site 
in the NEE r / lo of D. melanogaster (Fig. 13 A and C). 
This specific mutation appears to weaken the activa- 
tion response of the enhancer without affecting the 
specific threshold setting (Fig. 13 B-C). Because we 
have shown a general tendency of functional E(CA)T 
elements to have expanded beyond the heptamer se- 
quence (Table 2), and of deprecated E(CA)T elements 
to have experienced runaway expansion into longer 
tracts (Figs. 7-8), we suspect that this process tends 
to push away combinatorial enhancer elements, such 
as Su(H) binding sites. In this context, selection 
may favor new Su(H) binding sites that are closer 
to the current functional encoding. Conveniently, 
deprecated N-D/3 sequences are similar to sequences 
matching the Su(H) binding motif and thus provide 
a convenient set of target sites for re-evolving more 
proximal Su(H) sites. 

Newly evolved NEEs are not enriched in cis- 
spectra 

Our results on the canonical NEEs of the four diver- 
gent lineages of D. melanogaster, D. pseudoobscura, 
D. willistoni, and D. virilis NEEs demonstrate that 
much of their sequence composition corresponds to 
relic deprecated encodings. This pertains not only to 
the sequences in between intact Dorsal, Twist /Snail, 
and Su(H) binding motifs but to most of the rec- 
ognizable and intact TF sites and variants as well. 
Because we predict that necro-element accumula- 
tion is a neutral signature related to the number of 
past threshold adaptations, whose number likely in- 



creases with age, we were curious about the extent 
of cis-spectral signatures in younger NEEs. We pre- 
viously documented a new NEE sequence at the sog 
locus of D. melanogaster [7|. The D. melanogaster 
NEE sog sequence has a CA-dinucleotide content of 
14.4%, which is on par with highest levels seen in 
A/P enhancers from all lineages but is mid-range for 
NEEs from D. melanogaster (compare with Fig. 11B 
points in the A/P box). However, because the CA- 
content of NEEs from D. melanogaster may have 
been secondarily reduced, we therefore wanted to 
query uncompacted Drosophila genomes with a pa- 
rameter set that is constrained only by the mini- 
mal molecular requirements. Thus, we queried the 
two largest Drosophila genome assemblies, which 
corresponded to D. ananassae (231.0 Mb) and D. 
willistoni (235.5 Mb). Both of these species are 
in the Sophophora subgenus, which includes D. 
melanogaster. 

Of the 1 kb genomic windows centered on all 
D(3 instances in any given genome and contain- 
ing E(CA)T anywhere in that window, we identified 
the subset of these sequences that also contained a 
generic ( "un-specialized" ) Su(H) binding site as well 
as linked Dorsal and Twist binding elements. The 
generic Su(H) site replaces the composite extended 
motif that described an overly-determined SUH el- 
ement and the overlapping Da ghost site. Using 
this set of minimal criteria, we nonetheless were able 
to identify the canonical NEE repertoires for each 
species. 

From the D. ananassae genome, we identified, 
cloned and tested both a functional set of canonical 
NEEs (Fig. 14), and a new NEE at the Delta (Dl) 
locus (Fig. 15). Delta encodes a ligand for the Notch 
receptor, whose signaling is relayed by the Su(H) TF 
itself [49j|64]. In D. melanogaster embryos, Delta 
is expressed in a narrow lateral stripe in the me- 
sectoderm and ventral most row of the neurogenic 
ectoderm using sequences that are unrelated to the 
NEE£) e ; 4a sequence of D. ananassae [50j . 

Like the NEE SO£( sequence, which matured in the 
melanogaster subgroup, the NEE£> e ; ta sequence in 
D. ananassae has not yet accumulated cither CA- 
satellite content or the Dorsal cis-spectra character- 
istic of necro-element plaques (Fig. 15A). Nonethe- 
less, this enhancer is functional in D. melanogaster 
embryos (Fig. 15B). Inspection of its Su(H) bind- 
ing site reveals that it does not overlap a ghost Da 
motif, which demonstrates again that Da is not re- 
quired (Fig. 15C). This is consistent with the inter- 
pretation that Da motifs are deprecated Dj3 mo- 
tifs exapted into functional SUH elements at ma- 
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ture NEEs, whose sequence compositions have been 
biased by long histories of necro-element accumula- 
tion. 

The NEEirjeita enhancer has a spacer of 3 bp, and 
occupies the low-end of the threshold mapping func- 
tion (Fig. 12). Therefore, because we characterized 
both high and low threshold NEEs that have evolved 
more recently in the Delta and sog loci of D. ananas- 
sae and D. melanogaster, respectively, without much 
necro-element accumulation, the czs-spectra of ma- 
ture NEEs are likely unrelated to function. Instead, 
the absence of the necro-element plaques suggest a 
shorter period of evolutionary maintenance, consis- 
tent with their phylogenetic distribution. 

Discussion 

In this study of regulatory DNAs from the 
Drosophila genus, we found that a certain Dorsal- 
threshold encoding mechanism maps a spacer length 
of 3-15 bp, which links a pair of well-defined Dor- 
sal and Twist binding sites, onto one well-defined 
dorsal border of expression that is 5-15 nuclei past 
the ventral border of the neurogenic ectoderm. The 
specialized Twist-binding E(CA)T sequence is a con- 
strained motif that satisfies binding preferences for 
both the Twist activator and the Snail mesodermal 
repressor. This sequence is also a palindromic CA- 
satellite sequence that is prone to CA-dinucleotide 
repeat expansions that alter the precise threshold 
setting spacer. Natural selection acts continuously 
to exploit E(CA)T instability to adapt the precise, 
threshold-setting spacers between adjacent and in- 
tact Dorsal and Twist binding elements. This pro- 
cess may also accelerate site turnover, because it 
would frequently necessitate stabilizing selection of 
compensatory threshold settings in response to this 
intrinsic instability. Thus, evolutionary maintenance 
of optimal NEE function involves the clock-like pro- 
duction of dead Dorsal and Twist binding elements, 
which we call necro-elemcnts. Necro-element accu- 
mulation is the major determinant of sequence com- 
position in enhancers that have matured beyond a 
certain age (>10 My). Further genomic sampling 
of taxa will allow refinement of the necro-element 
clock, and ascertain whether it reaches a saturation 
point for the most ancient enhancers. This ques- 
tion increases the need to sequence larger genomes 
that are not compressed secondarily by high deletion 
rates [65] . 

We found that the specialized Su(H) bind- 
ing site SUH is exapted from deprecated, non- 
functional Dorsal binding sites in all canonical 



NEEs of Drosophila. SUH appears to influence the 
strength of activation without affecting the Dorsal 
concentration-threshold response. This site is spe- 
cialized in mature NEEs but not in more recently 
evolved NEEs. This unusual turnover process for 
Su(H) sites may be necessitated by the tendency of 
CA-satellite expansion to act as a "conveyor belt" 
pushing out coordinating elements such as the Su(H) 
binding site, but leaving a convenient path of dep- 
recated elements that are easily exapted into closer 
Su(H) sites. 

We found that functional NEEs can be derived 
from truncated fragments of mature NEEs that lack 
necro-elements while continuing to encode the cor- 
rect threshold setting. Also, functional NEEs have 
evolved more recently at non-canonical loci with- 
out having yet accumulated the characteristic necro- 
element plaques seen in older NEEs. Such NEEs 
bear Su(H) sites that do not extend to deprecated, 
ghost Dorsal binding sites. 

Last, we found a smooth continuum between in- 
tact NEE elements and increasingly divergent dep- 
recated necro-elements in these enhancers. Further- 
more, because the extreme range of this continuum 
is associated with the age of the enhancer, we infer 
that necro-element accumulation begins with each 
NEE origination and is continuously co-extant with 
its adaptive maintenance. This has led us to a richly- 
predictive yet parsimonious model of NEE evolution 
that we call dynamic deprecation (Fig. 16). With in- 
creasing time, the background sequence composition 
of enhancers is profoundly altered and eventually 
dominates the nature of binding site sequences be- 
cause it provides a highly-biased ground state from 
which new sites are exapted. 

Defining necro-elements, c/s-spectra, and dep- 
recated necro-elements. We have used the term 
necro-element initially to describe intact or nearly 
intact binding sites occurring within well-defined 
clusters but which are no longer relevant in the cur- 
rent threshold encoding. This term can be applied 
to sites subjected to dynamic deprecation, includ- 
ing those that are deprecated solely through changes 
in syntax. However, because there is no clear di- 
viding line between potentially-functional binding 
sites deprecated by syntax and increasingly diver- 
gent sites, we have chosen to expand the use of 
"necro-element" to refer to the entire continuum 
constituting a clustered plaque of necro-elements. 
We call such clusters cis-spectra in order to dis- 
tinguish them from functional "clusters" of binding 
sites. Cis-spectra are well-defined operationally as 
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motif clusters that remain distinct from background 
genomic sequence as the degeneracy of the matching 
motif is increased and additional, presumably older, 
relic sites are revealed. In this context, we used the 
term motif spectrum to refer to the bioinformatic 
set of motif descriptors that detect cis-spectra for a 
given TF. 

The use of the prefix necro- rather than the pre- 
fix pseudo- is justified by several important distinc- 
tions that are peculiar to necro-elements. Etymo- 
logically, the Greek root cj;eu5o- means 'false', while 
the Greek root vexpo- means 'dead' and more accu- 
rately connotes 'loss of function'. This is an im- 
portant distinction because biological systems are 
rich in functional dissimulation (e.g., mimicry and 
camouflage on an organismal scale, but also extend- 
ing to viral oncogenes that dissimulate normal cellu- 
lar genes, and potentially true pseudo-elements that 
function as decoy DNA elements to sequester a tran- 
scription factor). Biologically, the chosen term must 
encompass in its definition both deprecated and non- 
deprecated elements, as well as both non-functional 
and functionally-redundant elements. Convention- 
ally, the usage of the pseudo- prefix for sequence 
lengths on the length-scale of cis-elements is un- 
wieldy because it is used almost exclusively for recog- 
nizable homologs of protein-coding genes with clear 
inactivating mutations (e.g., internal stop codons, 
and frameshifts) . Necro-elements cannot always be 
identified by sequence alone because they can be ren- 
dered functionally redundant or non-functional by 
selection on syntax. 

We also used the term deprecation to connote 
additional information as to the probable role of se- 
lection in producing a necro-elemcnt. A deprecated 
necro- element is a useful distinction to character- 
ize a necro-element that has undergone selection for 
attenuated or complete loss of function in connec- 
tion with the selection of a replacement threshold- 
encoding located either at the enhancer or elsewhere 
in the locus. Thus, deprecation implies that selec- 
tion was active in removing an epistatic relation- 
ship between two conflicting threshold-encodings. 
Selection may favor such an outcome when the 
pre-deprccatcd functional element encodes a lower- 
threshold than the positively selected replacement 
encoding. In such cases, a low threshold encod- 
ing masks the function of any high threshold en- 
coding under positive selection and must engender 
active selective deprecation. On the other hand, if a 
high-threshold encoding is being selectively replaced 
by a low threshold encoding, we expect no active 
deprecation forces. Instead, we expect gradual loss 



of function via neutral drift. This is an unexpect- 
edly novel evolutionary mechanism for generating 
apparent regulatory redundancy. In this context, we 
suggest that redundant "shadow enhancers" , which 
have been observed at several Dorsal target loci in 
the D. melanogaster genome [66j , should be incorpo- 
rated into the same dynamic deprecation framework 
when appropriate. Selection may adapt an exist- 
ing threshold encoding or transition its focus to a 
new threshold encoding that is located either within 
the same enhancer or elsewhere in the locus. Multi- 
ple such events are likely to pepper the idiosyncratic 
histories of different lineages at different times. In 
this context, shadow enhancers may be defined as 
out-moded enhancers, which were either redundant 
when replaced by distant low threshold enhancers, or 
actively deprecated by selection until their threshold 
was at least higher than a newer optimal low thresh- 
old enhancer located elsewhere in the same locus. 

Summary and implications. In principle, cis- 
spectral plaques of necro-elements should accumu- 
late in all complex eukaryotic enhancers that encode 
key regulatory variables in a precise syntax. The ex- 
tent of this clustering would then be determined by 
the age of the enhancer, and the number or rate of 
replacement adaptations over this time. While many 
of the intensely studied enhancers of Drosophila have 
corresponded to early embryonic enhancers that are 
cvolutionarily sensitive to changes in egg size and 
morphology, they are also proving useful in untan- 
gling the molecular and evolutionary aspects of en- 
hancer biology. 

In this evolutionary context, the biology of necro- 
element spectra of D /V enhancers appears to be di- 
rectly applicable to A/P enhancers responsive to the 
Bicoid morphogen gradient system. Evolution of 
egg size and developmental timing during embryo- 
genesis is likely to place evolutionary demands on 
both A/P and D/V morphogen gradient systems, 
which are operating simultaneously in the same cells. 
While we have shown that Dorsal binding site den- 
sity does not correlate with threshold encoding, oth- 
ers have shown that Bicoid binding site strength in 
the heavily-clustered A/P enhancers docs not corre- 
late with A/P position of activity [67]. Under the 
dynamic deprecation theory of enhancer evolution, 
this paradox is explained if the majority of Bicoid 
binding site variants at such clusters represent necro- 
elements deprecated by mutations affecting the site 
itself, its coordinating site(s), and/or their syntac- 
tical relation. This interpretation can be confirmed 
by future studies identifying the minimal molecular 
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requirements for encoding variable Bicoid-response 
thresholds. 

One important implication for current studies is 
that motif descriptors and algorithmic motif predic- 
tors should be constructed over a judiciously-chosen 
set of functionally-equivalent sites across a genome, 
rather than on the continuum of necro-element spec- 
tra at a cluster. Such clusters are often exploited 
statistically to increase the number of "example ele- 
ments". Such approaches lead to degenerate motifs 
describing both extant functional elements and sur- 
rounding deprecated sequences. Newer approaches 
that are both alignment-free and wary of exploiting 
the abundance of related sequences will do better at 
distinguishing functional elements from evolutionary 
artifact [371168] . 

The conceptual re-framing of the functional evo- 
lution of enhancers overturns a common assumption 
that all binding site variations within an enhancer 
are functional and/or subtly necessary. This as- 
sumption has been directly responsible for the im- 
pression that the "cluster code" is "flexible", by 
which is meant that enhancer activity is robust to 
mutational disruption |33H36| . However, whether 
these site sequences are flexible or not flexible is only 
a productive question if the observed sequences are 
functional in some way. In contrast, our results have 
supported the existence of a precise encoding scheme 
that uses only a limited subset of sites in the clus- 
ter [37] . Mutational variation in the organization of 
these specialized sites produces a specific and well- 
mapped range of expression phenotypes [7J. Indeed, 
because this precise encoding scheme turns brittle 
when extended past its functional range, selective 
deprecation is facilitated. This view is further en- 
riched by considering the complex macroevolution- 
ary processes that result when taxa and lineages per- 
sist through several expansions caused by non-static 
ecological/climatic conditions [69H72] . Regulatory 
evolution is likely to underlie many of the stabiliz- 
ing and adaptive changes associated not only with 
these climate-driven historical events but future cli- 
mate changes as well [73]. 

The potential for gene regulatory evolution is 
likelier when encoding schemes for relevant regu- 
latory traits are broad-ranged functions that map 
genotype (enhancer sequence) to phenotype (expres- 
sion profile). Precise codes provide the additional 
category of syntax on which natural selection can 
act. However, a broad or evolutionarily-varied phe- 
notypic range may be a simple consequence of molec- 
ular mechanisms that are employed ontogenetically 



at multiple loci in precise but varied functional con- 
figurations. Understanding this complex relation be- 
tween molecular encoding systems and their complex 
evolutionary histories may prove useful in gauging 
the intrinsic adaptive potential of specific systems 
subjected to future climate change [74] . 

Materials and Methods 

Embryonic experiments. Animal rearing, P- 
element mediated transformations, embryonic col- 
lections, staging, anti-DigU probe synthesis, and 
whole-mount in situ hybridizations were conducted 
as previously reported [Jj . 

Probes for whole-mount in situ hybridization in 
D. Willistoni embryos. Primers for probe synthe- 
sis are as listed here, rho: 5'-CCGCC TTTGC CTATG 
ACCGT TATAC AATGC and 5'-Pr-TTAGG ACACA CCCAA 
GTCGT GC, where Pr = the T7 promoter sequence 
5'-CCGCC TAATA CGACT CACTA TAGGG. vn: 5'-CCGCC 
TAGTG ACGAC AACAA CAACA GTAGC and 5'-Pr-ATTTT 
CACTCA CAGCC ATTTT CACC. vnd: 5'-CCGCC CTAGT 
CCGGA TAGCA CTTCG C and 5'-Pr-CGGCT GCCAC 
ATGTT GATAG G. brk 5'-CCGCC AACAA AGTTC GTCGG 
CAACA ACG and 5'-Pr-CATGG TGAGG TGAGG ACTAT 
GG. 

Whole genome sequence analysis. Current ver- 
sions for all genomes were downloaded from Flybase 
(www.flybasc.org) and these correspond to assembly 
versions: dmel ver5.22, dana verl.3, dpse. ver2.6, 
dwil verl.3, and dvir verl.2. Various whole-genome 
queries were conducted using shell scripts composed 
of shell, perl, grep, and wc UNIX commands and are 
available upon request. Separate queries were con- 
ducted for NEE signatures and CA-satellite content. 
Special genome files were processed for counting per- 
cent content of a given motif. We call these "*.HNF" 
files because they are header and N-free files; these 
having been replaced by newline characters. 
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Figure 1. Organization of specialized elements within Dorsal cis-spectra of canonical NEEs. 

Shown are the specialized sites embedded within the Dorsal cis-spectra of the D. melanogaster NEE„ n d 
sequence, which is representative of canonical NEEs at the rho, vn, vnd, and brk loci of the Drosophila 
genus. Numerous lines of evidence in this study demonstrate that the Dorsal cis-spectra are specific to 
mature NEEs (>40 My old), non-functional, and likely produced by dynamic deprecation of precisely 
spaced Dorsal and Twist sites. Dorsal cis-spectra are defined by a motif spectrum of increasingly 
degenerate Dorsal binding motifs. All instances of the motifs listed in the key are shown in the graphic. 
The motif sequences in all of the figures and text are written according to IUPAC DNA convention: S = 
[CG], W = [AT], R = [AG], Y = [CT], K = [GT], M = [CT], B = [CGT], D = [AGT] , H = [ACT], V = [ACT], N = [ACGT], 
where nucleotides in brackets are equivalent. All Dorsal binding sites, motifs, and variants will be depicted 
with the best halfsite on the 5'- side regardless of its polarity to E(CA)T. 
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Figure 2. Drosophila phylogeny with tested NEE sequences. 

In this study, we expand our previous studies to two genomes not marked by secondarily-derived compact 
genome sizes. These genomes correspond to the D. ananassae and D. willistoni lineages (blue). We also 
expand our analyses by testing additional mutated versions of these and previously cloned enhancers. 
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Figure 3. Functional NEEs from D. willistoni. 

Functional NEEs from D. willistoni occur in canonical loci that arc also expressed in the neurogenic 
ectoderm. A— D) NEE-bcaring loci in D. willistoni are expressed endogenously in the neuroectoderm of 
stage 5(2) embryos as shown by in situ hybridization with an anti-sense RNA probe to exonic sequences. 
E H) NEE sequences from D. willistoni can drive a lacZ reporter gene in transgenic D. melanogaster 
embryos as shown by in situ hybridization with an anti-sense RNA probe to lacZ. Embryos in all figures 
are depicted with anterior pole to the left, and dorsal side on top. Image labels indicate the species of the 
embryo, and the gene or reporter being detected. All reporters are in D. melanogaster embryos. 
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Figure 4. D. willistoni NEEs are set to higher concentration thresholds than D. 
melanog aster . 

See text. 
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Figure 5. Dorsal cis-spectra in canonical D. willistoni NEEs contain a single Da site. 

Constituents of Dorsal cis-spectra in D. willistoni NEEs are visualized by matches to Dorsal halfsites (base 
D halfsitc, pale blue) and degenerate full sites (base D, light blue) as shown in the key. One such site at 
each canonical NEE matches the Da consensus (purple). This same site overlaps a Su(H) binding site 
(SUH, red), which occurs on the top strand at each NEE. For efficient referencing across the set, all NEEs 
from D. willistoni arc aligned and centered on the unique Da site, plus or minus 400 bp, unless otherwise 
stated. 
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Figure 6. Dorsal cis-spectra in canonical D. willistoni NEEs contain a single Dj3 site. 

One Dorsal binding site variant in each cluster matches the Dj3 consensus (dark blue). This specialized D/3 
site is the closest (<30 bp) Dorsal binding site variant to the E(CA)T element (green), which is a binding 
site for the Dorsal co-activator Twist, and the Snail mesodermal repressor. Sites matching this specialized 
Dorsal binding motif Dj3 are distinct from the Da elements (numbered purple labels). 
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Figure 7. Canonical NEEs from D. willistoni are enriched with CA-satellite. 

All canonical NEEs from the D. willistoni genome also are enriched in CA-satellite, almost as much as the 
NEEund sequence, which was present in the latest common ancestor of dipterans (see text). Furthermore, 
the longest such tracts are associated with divergent Dj3 halfsites (pale blue). The NEE„„ cis-spectra has 
expanded CA-satcllitc tracts associated with ghost D/3 elements at ^340-400 bp and again at ~580-630 bp, 
while NEE r /j also has expanded CA-satellite tracts coordinated to ghost D(3 motifs at ^130-150 bp and 
again at ~270-290 bp. Such signatures are consistent with the hypothesis that much of the clustering is 
evidence of past deprecation events between precisely spaced D(3 and E(CA)T elements. Enhancers are 
aligned on the unique SUH/Da site at position 400 bp (see Fig. 5). 
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Figure 8. The vnd NEE from D. willistoni is enriched in split, palindromic CA-satellite. 

Analysis of the vnd NEE sequence in the relatively uncompacted D. willistoni genome indicates a long 
history of instability at E(CA)T elements. Such signatures could be variably interpreted as past selection for 
new E(CA)T elements or new optimized spacer lengths, intrinsic mutational bias for repeat expansions, 
and/or both of these combined. A) Split, palindromic CA-satellite tracts arc present in the NEE^d of D. 
willistoni as visualized by matches to short CA-satellite motifs (5'-CACA or 5'-ACAC). The larger palindromic 
CA-satellite tracts are numbered and their sequences shown in B. B) The exact sequence composition of the 
CA-satellite indicates that these were once intact E(CA)T elements as found at the presumed functional site 
located in palindrome #d (green box). However, even the intact E(CA)T shows recent expansion in this 
lineage. Such expansions or contractions relative to the D/3 motif alter the precise length of the linking 
spacer and consequently also alter the precise Dorsal concentration threshold of the enhancer. C) 
Increasingly longer, and presumably older CA-tracts are associated with increasingly divergent Dorsal 
binding site variants as shown. For each such Dorsal binding site variant listed the Hamming Distance 
(HD) or number of mismatches (red letters) from Df3 is indicated. 
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Figure 9. The second E-to-D encoding within NEE„„d was deprecated prior to divergence of 
the Drosophila genus. 

A) Unlike D. willistoni, the NEE w d in D. melanogaster has two apparently intact threshold-encodings, 
one of which is coordinated by a 10 bp spacer (narrow yellow column), and another that is coordinated by 
a 20 bp spacer (wide yellow column). Motifs follow the key in Fig. 1 except the Dorsal binding spectra are 
shaded with decreasing intensity as degeneracy increases. The 947 bp "full-length" fragment encompasses 
the entire 720 bp shown in the graphic. Two smaller tested fragments are shown in dark bold lines. Both 
of these overlap and include the SUH/Da site (red/blue stack). B) The 947 bp NEE^ "full-length" 
enhancer sequence drives a normal pattern of lacZ expression. C) The 300 bp NEE,^ subfragment drives 
a similar pattern as the full-length version, despite the absence of the second coordinated Dorsal/Twist 
binding site pair. D) The 266 bp NEE^nd subfragment fails to drive a robust lateral stripe of lacZ 
expression at any threshold. Faint staining is occasionally seen in a lateral patch towards anterior pole. E) 
Quantification of the stripe width over several embryos for each construct depicted in A-D shows that the 
full-length enhancer is not measurably different than the 300 bp fragment containing a single E-to-D 
encoding. F) The Dorsal binding site coordinated by 20 bp to the second E(CA)T element is divergent (red 
letters) from the D(3 consensus for D. melanogaster. This D. melanogaster D/3 consensus matches the D/3 
consensi in other lineages more closely than a D. melanogaster consensus made with the 20 bp coordinating 
Dorsal binding site variant. 
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Figure 10. E(CA)T versus Twist and Snail binding motifs in D. willistoni NEEs. 

The simple superimposition of motifs representing binding preferences for Twist bHLH complexes and the 
Snail C2H2 zinc-finger transcriptional repressor, results in a predicted dual motif that is similar but not 
identical to the observed E(CA)T motif. Because the E(CA)T motif appears to be subject to repeat 
expansions and contractions, as seen in Table 2, and because this would result in threshold-modifying 
variants, we believe that the peculiar difference between the predicted dual site and the observed invariant 
site, is strong support for our evolutionary model of dynamic deprecation of encodings via CA-satellite 
instability. These motifs are depicted here. 
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Figure 11. High levels of E(CA) T fragments have accumulated in canonical NEEs across the 
genus. 

The percentage of sequence that is composed of either 5'-CA dinucleotides or 5'-CAC trinucleotides is 
graphed for several orthologous groups of enhancers from D. melanog aster, D. pseudoobscura, D. willistoni, 
and D. virilis. Each window of NEE sequence is taken ±480 bp from Df3 for each species. Each window of 
an A/P enhancer is a 960 bp sequence centered around the Bicoid binding site cluster. A) Each 
orthologous set of NEEs is boxed separately to visualize enrichment relative to other groups. In contrast to 
the canonical NEEs, the Bicoid binding site clusters of several canonical A/P enhancers at the eve, gt, Kr, 
and hb loci are not associated with high CA-satcllitc content. All 16 of these enhancers fit within the blue 
box shown in the graph. B) Same as A, except NEEs are boxed by species. Because D. willistoni and D. 
virilis represent lineages from each of the subgenera of Drosophila, this graph highlights the 
secondarily-derived, reduced state of CA-satellite in D. melanogaster NEEs. 
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Figure 12. The spacer between the E(CA)T and JD/3 encodes the threshold-response to the 
Dorsal morphogen concentration gradient, and is independent of the number or density of 
variant Dorsal binding sites. 

A) The number of Dorsal halfsites in the ~1 kb window ± 480 bp from D/3 from diverse NEEs of varied 
age, lineage, and locus, is not predictive of the the precise Dorsal concentration threshold readout. B) In 
contrast, the precise spacer length between the E(CA)T and D/3 elements is predictive (red trendlinc, 
second order polynomial) of the precise threshold readout over a range from 3 bp to 15 bp. Vertical axes 
for both graphs in A and B are aligned for cross-referencing. 
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Figure 13. The Da motif is a N-Df3 necro-element that was exapted into a Su(H) binding site. 

A) Alignment of the lineage-specific consensi for Da shows that the portion overlapping the Su(H) binding 
site portion is the least divergent. The second half of the Dorsal binding site is also increasingly degenerate 
(black struck-out letters) in comparison to other lineages. Such a signature of divergence is characteristic 
of drift. Based on this pattern of divergence and the activities of more recent NEEs, we conclude that Da 
is non-functional and represents a deprecated D(3 site exapted into SUH. Also shown are the wild-type and 
mutated sequences of this site tested in the NEE r /j backbone from D. melanogaster. B C) Relative 
activities of NEE^-driven reporters differing by the presence (B) or absence (C) of the Su(H) binding 
site, via a mutation that leaves the Dorsal site intact. The SUH element is required for activity levels but 
not the precise Dorsal concentration threshold encoding. This suggests that Su(H) acts after Dorsal and 
Twist threshold-activation. 
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Figure 14. Canonical NEEs from D. ananassae are functional in D. melanogaster embryos. 

See text. 
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Figure 15. A newly evolved NEE at the Delta locus of D. ananassae has not yet 
accumulated a necro-element cluster. 

A) The genome of D. ananassae contains a recently-evolved enhancer NEEoeita as well as older, canonical 
NEEs, such as NEE,j„d (shown). Dorsal cis-spectra are associated with the canonical NEEs but not with 
the NEEoei ta sequence, despite employing the essential NEE logic of an E-to-D encoding that is near a 
Su(H) binding site. Brackets in the NEE£> e / ta sequence indicate the boundaries of the fragment tested in 
D. melanogaster and shown in B. B) The NEE£> e ; ta module from D. ananassae drives a narrow stripe of 
expression spanning the ~5 nuclei of the mesectoderm and ventral neurogenic ectoderm in D. melanogaster 
embryos. C) The SUH element does not overlap a ghost Da site. This suggests that the SUH element in 
this recently-evolved NEE sequence is the original site that has not yet needed to re-evolve or track closer 
to the latest, functioning E-to-D encoding. CA-satellite is defined here as sequences matching two 
CA-dinucleotide repeats or longer (given by the UNIX regular expression: A? (CA) {2 , }C?. 
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Figure 16. Dynamic deprecation produces necro-element clusters over time. 

The evolutionary maintenance of precise threshold encodings via dynamic deprecation and re-selection of 
replacement encodings can be inferred for Neurogenic Ectodermal Enhancers (NEEs). This process 
produces necro-element clusters (starred, faded boxes) during the course of lineage evolution. Depicted are 
binding elements for Dorsal (blue), Twist (green), and Su(H) (red). Spacer elements (orange) separate the 
Dorsal and Twist elements by a fixed distance, whose length determines the precise threshold encoding 
required for a given embryo type occurring during lineage evolution. Because genes, such as vnd, which are 
expressed in the neurogenic ectoderm must be expressed over the same number of cells despite evolutionary 
changes in the size of the embryo (right column), selection will favor NEEs with new, compensatory, 
threshold encodings (left column). There are multiple other reasons for selecting new threshold encodings, 
but these are not depicted here for simplicity. New encodings arise either by selection on variant spacer 
lengths (e.g., evolution of threshold #4), or by the selection of new replacement sites defining preferred 
spacers (thresholds #1-3). Su(H) sites in particular can also be exapted from relic Dorsal necro-elements 
when selection favors proximity to the current encoding (see threshold #4) . Over time, these processes 
produce a cluster of necro-elements at an enhancer. Increasingly, this prominent signature heavily 
influences future evolutionary kinetics. 
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Tables 



Table 1. Specialized Dorsal motifs in Drosophila NEEs. 



Species 


Motif 


Consensus over canonical NEEs 


D. melanogaster 


SUH/Da 


CGTGGGAAAWDCSM 


D. melanogaster 


DP 


NVVSGGAAABYCCM 


D. ananassae 


SUH/Da 


CGTGGGAAWWDCBM 


D. ananassae 


DP 


BSVNGGAAABYCCC 


D. pseudoobscura 


SUH/Da 


CGTGGGAAWWWHBV 


D. pseudoobscura 


DP 


BSMSGGAAABYCCH 


D. willistoni 


SUH/Da 


YGYGGGAAWWDCSM 


D. willistoni 


DP 


DKVSGGAAABYCCH 


D. virilis 


SUH/Da 


CGTGGGAAWWWVBV 


D. virilis 


Dp 


KNVSGGAAABYCCH 



DNA consensi for the indicated elements of canonical NEEs in each species are listed in IUPAC code. 
Canonical NEEs are located in vnd, rho, vn, brk loci. Underlined letters refer to the more degenerate site of 
two equivalent positions across the Da and Dp consensi for that lineage. 
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Table 2. List of intact or nearly intact encodings in tested NEEs. 



No. 






Enhancer 


E(CA)T l 


Spacer 


Df3' z 


1 


878 


bp 


D. 


mel. 


NEE Wio wt 


CACATGT 


5 bp 


GGGAAATTCCC 


2 


302 


bp 


D. 


mel. 


NEEr/jo wt min 


CACATGT 


5 bp 


GGGAAATTCCC 


3 


302 


bp 


D. 


mel. 


NEE Wlo SUH A 


CACATGT 


5 bp 


GGGAAATTCCC 


4 


912 


bp 


D. 


mel. 


NEE„„ sp -1 bp 


CACATGT 


4 bp 


CGGAAATTCCC 


5 


913 


bp 


D. 


mel. 


NEE.„„ wt 


CACATGT 


5 bp 


CGGAAATTCCC 


6 


914 


bp 


D. 


mel. 


NEE„„ sp +1 bp 


CACATGT 


6 bp 


CGGAAATTCCC 


7 


915 


bp 


D. 


mel. 


NEE„„ sp +2 bp 


CACATGT 


7 bp 


CGGAAATTCCC 


8 


918 


bp 


D. 


mel. 


NEE„„ sp +5 bp 


CACATGT 


10 bp 


CGGAAATTCCC 


9 


947 


bp 


D. 


mel. 


NEE„ nd wt 


ACACATGT 


10 bp 


GGGAAACCCCA 














CACATGTTG 


20 bp 


GGGAAAACCGG 


10 


300 


bp 


D. 


mel. 


NFiE vnc i wt trunc 


ACACATGT 


10 bp 


GGGAAACCCCA 


11 


266 


bp 


D. 


mel. 


NKE vn d wt trunc 


CACATGTTG 


20 bp 


GGGAAAACCGG 


12 


657 


bp 


D. 


mel. 


NEE 6rfc wt 


CACACATGTGTGTTTG 


15 bp 


GGGAAAGCCCC 














CAACACATGTT 


21 bp 


GGGAATGTCAA 


13 


651 


bp 


D. 


mel. 


NEE 6rfc sp -3 bp 


CACACATGTGTGTTTG 


12 bp 


GGGAAAGCCCC 














CAACACATGTT 


21 bp 


GGGAATGTCAA 


14 


553 


bp 


D. 


mel. 


NEE S09 wt 


CCACATGTGT 


7 bp 


CGGAAATTCCC 


15 


738 


bp 


D. 


ana. 


NEE r ,, wt 


CCACATGTGT 


3 bp 


AGGAAATTCCC 


16 


758 


bp 


D. 


ana. 


NEE^ wt 


CACATGT 


5 bp 


CGGAAATTCCC 


17 


642 


bp 


D. 


ana. 


NEEw wt 


CACACATGTT 


11 bp 


GGGAAACCCCC 














CACATGTGTTGG 


40 bp 


TGGAAAAACCG 


18 


946 


bp 


D. 


ana. 


NEE(, rfc wt 


C AC AC ATGTGTs GGTTTGT 


15 bp 


TGGAAAGCCCC 


19 


658 


bp 


D. 


ana. 


NEEd; wt 


CACATGTTGCTG 


3 bp 


GGAAAATTCCA 


20 


843 


bp 


D. 


pse. 


NEE r/lo wt 


CACATGTT 


6 bp 


GGGAAATTCCT 














CCCACATGTGTTT 


19 bp 


GGGAAATTCCT 














CCCACATGTGTTT 


45 bp 


CGGAAATTCCT 


21 


858 


bp 


D. 


pse. 


NEE„„ wt 


CCACATGTTTGG 


5 bp 


CGGAAATTCCC 


22 


1,305 


bp 


D. 


pse. 


NEE„ n( j wt 


CACACATGTTGG 


11 bp 


GGGAAACTCCA 














ACACATGTTTTT 


10 bp 


GGGAA7TCCCT 














CACACATGTTGG 


28 bp 


TGGAAAAACCG 


23 


859 


bp 


D. 


pse. 


NEE &rfc wt 


CACACCACATGTGTGTTTG 


15 bp 


GGGAAAGCCCC 


24 


784 


bp 


D. 


wil. 


NEE r ho wt 


CACATGT 


6 bp 


GGGAATTCCTA 














CACACACATGTG 


19 bp 


GGGAATTCCTA 














CACACACATGTG 


26 bp 


CGGAAATTCCT 


25 


796 


bp 


D. 


mil. 


NEE^n wt 


ACAAACACATGT 


14 bp 


CGGAAATTCCC 


20 


790 


1 

bp 


D. 


wil. 


NEE„„ sp -7 bp 


CAAAACACATGT 


7 bp 


CGGAAATTCCC 


27 


964 


bp 


D. 


wil. 


NEE„ nc j wt 


CACACATGTTG 


11 bp 


GGGAAACCCCA 


28 


960 


bp 


D. 


wil. 


NEEw sp +E(CA)T 


CACATGT 


7 bp 


CGGAAAAACCG 














CACACATGTTG 


11 bp 


GGGAAACCCCA 


29 


748 


bp 


D. 


wil. 


NEE brfc wt 


CAACACATGTGTTTGGGTG 


13 bp 


GGGAAAGCCCC 


30 


742 


bp 


D. 


wil. 


NEEbrfc sp -6 bp 


CAACACATGTGTTT 


7 bp 


GGGAAAGCCCC 


31 


726 


bp 


D. 


vir. 


NEE r/lo wt 


CCACATGTG 


7 bp 


CGGAAATTCCT 


32 


828 


bp 


D. 


vir. 


NEE„„ wt 


CCACATGTTTGTG 


6 bp 


CGGAAATTCCC 


33 


1,011 


bp 


D. 


vir. 


NEE vnd wt 


CACACATGTTG 


8 bp 


GGGAAACCCCA 


34 


756 


bp 


D. 


vir. 


NEE hrfc wt 


CACATGTGTTTGG 


12 bp 


GGGAAAGCCCC 



1. CA-satcllitc extending from intact E(CA)T elements is shown when present. Fragmented CA-satellite and their loosely 
coordinated Dorsal spectra are not shown. Likely deprecated encodings are italicized. 

2. Dorsal sites are written with the best halfsite on the top strand. D/3 sequences departing from species' consensi are 
indicated with a tilde. 
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Table 3. CA-satellite content in Drosophila genomes and their canonical NEE sets. 





D. melanogaster 


D. willistoni 


D. virilis 




release 5.22 


release 1.3 


release 1.2 


Total DNA in assembly 


162,370,174 bp 


223,610,028 bp 


189,205,863 bp 


% CA-satellite - genome 


3.9% 


4.0% 


4.5% 


% CA-satellite - canonical NEEs 


5.3% 


7.7% 


10.0% 


% E(CA)T - canonical NEEs 


1.3% 


1.5% 


1.8% 



CA-satellite was defined as CA-dinucleotide repeats of 2 or more with an optional single nucleotide extension 
of the repeat pattern at either end. Canonical NEE sequences for vnd, rho, vn, brk loci were extracted 
±480 bp from D/3. 



